NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Chain-of-Factors Paper-Reviewer Matching

https://doi.org/10.1145/3696410.3714708

Zhang, Yu; Shen, Yanzhen; Kang, SeongKu; Chen, Xiusi; Jin, Bowen; Han, Jiawei (April 2025, ACM)

Free, publicly-accessible full text available April 22, 2026
SMART: Self-Aware Agent for Tool Overuse Mitigation

https://doi.org/10.18653/v1/2025.findings-acl.239

Qian, Cheng; Acikgoz, Emre Can; Wang, Hongru; Chen, Xiusi; Sil, Avirup; Hakkani-Tür, Dilek; Tur, Gokhan; Ji, Heng (January 2025, Association for Computational Linguistics)

Full Text Available
Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the Ugly

https://doi.org/10.1109/icdew61823.2024.00040

Taylor, Alexander K; Huang, Yicong; Hao, Junheng; Lin, Xinyuan; Chen, Xiusi; Wang, Wei; Li, Chen (May 2024, IEEE)

Full Text Available
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

https://doi.org/10.18653/v1/2024.emnlp-main.498

Zhang, Yu; Chen, Xiusi; Jin, Bowen; Wang, Sheng; Ji, Shuiwang; Wang, Wei; Han, Jiawei (January 2024, Association for Computational Linguistics)

Full Text Available
Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

https://doi.org/10.1145/3580305.3599544

Zhang, Yu; Jin, Bowen; Chen, Xiusi; Shen, Yanzhen; Zhang, Yunyi; Meng, Yu; Han, Jiawei (August 2023, ACM)
Proc. 2023 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (Ed.)
Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e.g., category names, category-indicative keywords). Existing studies on weakly supervised paper classification are less concerned with two challenges: (1) Papers should be classified into not only coarse-grained research topics but also fine-grained themes, and potentially into multiple themes, given a large and fine-grained label space; and (2) full text should be utilized to complement the paper title and abstract for classification. Moreover, instead of viewing the entire paper as a long linear sequence, one should exploit the structural information such as citation links across papers and the hierarchy of sections and paragraphs in each paper. To tackle these challenges, in this study, we propose FuTex, a framework that uses the cross-paper network structure and the in-paper hierarchy structure to classify full-text scientific papers under weak supervision. A network-aware contrastive fine-tuning module and a hierarchyaware aggregation module are designed to leverage the two types of structural signals, respectively. Experiments on two benchmark datasets demonstrate that FuTex significantly outperforms competitive baselines and is on par with fully supervised classifiers that use 1,000 to 60,000 ground-truth training samples.
more » « less
Full Text Available
How the experience of California wildfires shape Twitter climate change framings

https://doi.org/10.1007/s10584-023-03668-0

Ko, Jessie_W Y; Ni, Shengquan; Taylor, Alexander; Chen, Xiusi; Huang, Yicong; Kumar, Avinash; Alsudais, Sadeem; Wang, Zuozhi; Liu, Xiaozhen; Wang, Wei; et al (January 2024, Climatic Change)

Abstract Climate communication scientists search for effective message strategies to engage the ambivalent public in support of climate advocacy. The personal experience of wildfire is expected to render climate change impacts more concretely, pointing to a potential message strategy to engage the public. This study examined Twitter discourse related to climate change during the onset of 20 wildfires in California between the years 2017 and 2021. In this mixed method study, we analyzed tweets geographically and temporally proximal to the occurrence of wildfires to discover framings and examined how frequencies in climate framings changed before and after fires. Results identified three predominant climate framings: linking wildfire to climate change, suggesting climate actions, and attributing climate change to adversities besides wildfires. Mean tweet frequencies linking wildfire to climate change and attributing adversities increased significantly after the onset of fire. While suggesting climate action tweets also increased, the increase was not statistically significant. Temporal analysis of tweet frequencies for the three themes of tweets showed that discussion increased after the onset of a fire but persisted typically no more than 2 weeks. For fires that burned for longer periods of more than a month, external events triggered climate discussions. Our findings contribute to identifying how the personal experience of wildfire shapes Twitter discussion related to climate change, and how these framings change over time during wildfire events, leading to insights into critical time points after wildfire for implementing message strategies to increase public engagement on climate change impacts and policy.
more » « less
Full Text Available
Scalable Graph Representation Learning via Locality-Sensitive Hashing

https://doi.org/10.1145/3511808.3557689

Chen, Xiusi; Jiang, Jyun-Yu; Wang, Wei (January 2022, Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM))

Full Text Available
MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

https://doi.org/10.1145/3488560.3498384

Zhang, Yu; Garg, Shweta; Meng, Yu; Chen, Xiusi; Han, Jiawei (February 2022, WSDM'22, The Fourteenth ACM International Conference on Web Search and Data Mining, March 2021)

We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided. Most existing classifiers leverage textual information in each document. However, in many domains, documents are accompanied by various types of metadata (e.g., authors, venue, and year of a research paper). These metadata and their combinations may serve as strong category indicators in addition to textual contents. In this paper, we explore the potential of using metadata to help weakly supervised text classification. To be specific, we model the relationships between documents and metadata via a heterogeneous information network. To effectively capture higher-order structures in the network, we use motifs to describe metadata combinations. We propose a novel framework, named MotifClass, which (1) selects category-indicative motif instances, (2) retrieves and generates pseudo-labeled training samples based on category names and indicative motif instances, and (3) trains a text classifier using the pseudo training data. Extensive experiments on real-world datasets demonstrate the superior performance of MotifClass to existing weakly supervised text classification approaches. Further analysis shows the benefit of considering higher-order metadata information in our framework.
more » « less
Full Text Available
#StayHome or #Marathon?: Social Media Enhanced Pandemic Surveillance on Spatial-temporal Dynamic Graphs

https://doi.org/10.1145/3459637.3482222

Zhou, Yichao; Jiang, Jyun-Yu; Chen, Xiusi; Wang, Wei (October 2021, The 30th ACM International Conference on Information and Knowledge Management (CIKM))

Full Text Available
Hierarchical Metadata-Aware Document Categorization under Weak Supervision

https://doi.org/10.1145/3437963.3441730

Zhang, Yu; Chen, Xiusi; Meng, Yu; Han, Jiawei (March 2021, WSDM'21, The Fourteenth ACM International Conference on Web Search and Data Mining, March 2021)
null (Ed.)
Categorizing documents into a given label hierarchy is intuitively appealing due to the ubiquity of hierarchical topic structures in massive text corpora. Although related studies have achieved satisfying performance in fully supervised hierarchical document classification, they usually require massive human-annotated training data and only utilize text information. However, in many domains, (1) annotations are quite expensive where very few training samples can be acquired; (2) documents are accompanied by metadata information. Hence, this paper studies how to integrate the label hierarchy, metadata, and text signals for document categorization under weak supervision. We develop HiMeCat, an embedding-based generative framework for our task. Specifically, we propose a novel joint representation learning module that allows simultaneous modeling of category dependencies, metadata information and textual semantics, and we introduce a data augmentation module that hierarchically synthesizes training documents to complement the original, small-scale training set. Our experiments demonstrate a consistent improvement of HiMeCat over competitive baselines and validate the contribution of our representation learning and data augmentation modules.
more » « less
Full Text Available

« Prev Next »

Search for: All records